AITopics | Prince of Wales-Hyder Census Area

Collaborating Authors

Prince of Wales-Hyder Census Area

Hyperbolic Image-Text Representations

Desai, Karan, Nickel, Maximilian, Rajpurohit, Tanmay, Johnson, Justin, Vedantam, Ramakrishna

arXiv.org Artificial IntelligenceJun-5-2023

Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP's performance on standard multi-modal tasks like image classification and image-text retrieval.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.09172

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(38 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services (1.00)
Energy (0.67)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback